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The  statement  on  page  6  that  D-D-D  is  the  minimizing 
open- loop  control  sequence  is  in  error.  For  the 
example  problem  given  in  the  Memorandum,  the  sequence 
U-U-D  is  superior  Hence,  for  this  example,  the  open- 
loop-optimal  feedback  scheme  discussed  on  page  8 
duplicates  the  pure  feedback  scheme. 

However,  if  the  r  rc  number  associated  with  going  up 
from  A  were  changed  from  5  to  10,  the  reader  can 
verify  that  the  open- loop-optimal  feedback  solution 
still  chooses  a  U  deii«ion  at  A,  but  optimal  pure 
feedback  dictates  a  b  decision  with  an  associated 
lower  expected  sum. 
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PREFACE 

Part  of  the  research  program  of  The  RAND  Corporation 
consists  of  basic  supporting  studies  in  mathematics,  one 
aspect  of  which  is  concerned  with  optimization  processes. 
This  Memorandum  is  concerned  with  optimal  control  of 
dynamic  systems  involving  random  variables.  Optimal 
control  rules  are  developed  and  evaluated. 

Optimization  is  particularly  important  in  determin¬ 
ing  rocket  trajectories  and  correcting  deviations  in 
flight  from  the  predetermined  trajectory. 


SUMMARY 


The  optimal  control  of  stochastic  systems  is  con¬ 
sidered.  Under  various  assumptions  concerning  the  informa¬ 
tion  available  to  the  controller,  different  optimal  control 
rules  result.  For  certain  specific  problems,  the  different 
control  schemes  are  analyzed  and  compared,  and  the  vast 
superiority  of  feedback  over  open- loop  control  is  demonstrated. 
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SOME  TYPES  OF  OPTIMAL  CONTROL 
OF  STOCHASTIC  SYSTEMS 

1.  INTRODUCTION 

A  stochastic  system  (i.e.,  a  dynamic  system  involving 
random  variables)  which  evolves  according  to  a  rule  which 
also  involves  variables  or  parameters  under  external  con¬ 
trol,  is  called  a  stochastic  control  system.  If  these 
variables  or  parameters  are  determined  so  that  the  system 
behaves  as  well  as  possible  as  measured  by  some  well- 
defined  criterion,  one  has  achieved  optimal  control  of 
the  stochastic  system. 

Under  varying  assumptions  concerning  the  information 
available  to  the  controller,  different  optimal  control 
policies  result.  In  this  Memorandum  we  shall  develop  and 
illustrate  several  different  control  schemes  and  compare 
their  behavior.  In  this  way  we  intend  to  demonstrate 
that  certain  control  philosophies  that  may  appear  super¬ 
ficially  to  be  equivalent,  are  really  quite  different. 

In  the  final  section  we  derive  asymptotic  expressions  for 
the  cost  of  optimal  control  using  several  different  schemes. 
This  yields  a  quantitative  measure  of  the  vast  superiority 
of  feedback  over  open-loop  control  for  a  particular 
stochastic  system. 
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2.  A  DETERMINISTIC  PROBLEM 

Let  us  begin  by  considering  a  trivial  three-stage 
discrete  deterministic  control  problem.  Given  the  directed 
network  shown  below, 


B 


we  wish  to  determine  that  path  from  point  A  to  line  B 
which  has  the  minimal  sum  of  the  numbers  written  along 
the  three  arcs  of  the  path. 

Let  us  denote  a  decision  to  follow  the  diagonally-up 
arc  from  an  intersection  by  U  and  the  diagonally -down 
arc  by  D.  By  examining  all  eight  possible  paths  from 
A  to  B,  we  discover  that  the  path  D-U-D  (diagonally  down, 
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then  up,  then  down)  has  sum-of-arc-numbers  zero  and  is 
the  unique  optimal  solution.  We  shall  call  such  a  designa¬ 
tion  of  the  solution,  giving  the  sequence  of  control 
decisions  to  be  followed  from  specific  initial  point  to 
termination,  the  optimal  open- loop  control 

A  second  way  of  presenting  the  solution  to  this  prob¬ 
lem  is  to  associate  with  each  node  of  the  figure  a  decision, 
either  U  or  D,  such  that  that  decision  is  the  initial  one 
of  the  optimal  path  from  the  node  to  the  terminal  line. 

This  set  of  decisions  assigned  to  nodes  is  most  efficiently 
determined  recursively  backwards  from  the  terminal  line  [1]. 
We  initially  record  the  optimal  decisions  and  minimal 
sum  to  termination  (encircled)  at  the  nodes  along  the  line 
C  in  the  figure  below, 


and  then  use  the  circled  numbers  to  determine  the  optimal 
decisions  and  sum  along  D  and,  finally,  from  A.  The  re¬ 
sulting  figure  is 


B 


We  shall  call  such  a  designation  of  the  solution,  giving 
the  optimal  decision  associated  with  starting  at  each 
possible  state  of  the  system  (i.e.,  at  each  node),  the 
feedback  optimal  control. 

The  interpretation  of  Fig.  3  is  that  the  optimal  path 
starting  at  point  A  has  sum  zero  and  starts  diagonally 
down.  The  node  reached  after  making  the  downward  move  has 
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a  U  written  by  it,  indicating  a  decision  to  go  diagonally 
up.  This  leads  to  a  node  with  a  down  decision.  Hence, 
D-U-D  is  the  optimal  path  from  A.  Note  that  the  feedback 
representation  of  the  solution  also  yields  the  best  path 
starting  from  other  nodes  not  along  the  D-U-D  path. 

The  important  point  is  that  for  a  specified  initial 
point  such  as  A,  the  open-loop  and  feedback  solutions 
are  equivalent  for  a  deterministic  process. 

3.  A  STOCHASTIC  PROBLEM 

Let  us  now  modify  the  above  problem  by  introducing  a 
stochastic  aspect.  We  shall  assume  that  the  decision 
designated  by  U  results  in  a  probability  of  3/4ths  of 
moving  diagonally  up  and  l/4th  of  moving  down.  The  alter¬ 
native  decision,  D,  has  a  3/4ths  chance  of  a  diagonally 
downward  move  and  a  l/4th  chance  of  an  upward  transition. 

We  now  have  a  stochastic  control  problem.  We  can  still 
exert  a  controlling  influence,  but  randomness  determines 
the  actual  transformation  of  state. 

As  a  criterion  for  comparing  possible  control  schemes, 
let  us  attempt  to  minimize  the  expected  sum  along  the 
path  from  A  to  line  6. 

To  determine  the  best  open- loop  control  policy,  we 
consider  all  eight  possible  sequences  of  decisions  and 
choose  the  one  with  minimal  expected  sum.  For  example, 
the  decision  sequence  U-U-U  has  probability  27/64ths  of 
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actually  yielding  the  path  U-U-U  with  sum  5,  9/64ths  proba¬ 
bility  of  yielding  the  path  D-U-U  with  sum  1200,  etc. 
Multiplying  the  probabilities  by  the  sums  and  adding,  we 
get  an  expected  sum  given  by 

EUUU  =  U  *  5  +  W  <1200  +  1205  +  5)  +  ^  (5  +  0  +  12) 

+  ■55*  12  -  360. 

It  turns  out  that  the  sequence  D-D-D  has  the  minimal  ex¬ 
pected  sum  of  approximately  120. 

The  best  feedback  control  is  computed  recursively 
backwards  just  as  in  the  deterministic  example.  Suppose 
that,  for  a  given  node,  the  expected  sums  starting  at  each 
of  the  two  possible  nodes  to  which  one  might  go  have  been 
determined.  Then  the  expected  sum  from  the  given  node  to 
the  termination  under  decision  U  is  obtained  by  multiplying 
the  upward  arc  number  plus  the  remaining  expected  sum 
associated  with  the  node  at  the  end  of  the  up-arc  by 
3/4ths  and  adding  l/4th  times  the  corresponding  downward 
numbers.  Decision  D  is  similarly  evaluated  reversing  the 
3/4ths  and  l/4th,  and  the  minimal  expected  sum  is  chosen. 
The  minimizing  decision  and  expected  sum  (encircled)  are 
recorded  at  the  node.  This  computation  leads  to  the 
figure  below: 
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Figure  4 

The  expected  sum  using  feedback  control  is  81  and  the 
control  policy  is  the  set  of  letters  associated  with  the 
nodes  in  Fig.  4. 

At  this  point  we  would  like  to  introduce  a  third 
control  scheme.  Let  us  use  the  optimal  open-loop  solution  to 
yield  our  initial  decision.  Then,  after  a  transition  has 
occurred,  let  us  observe  the  result  and  determine  the 
best  open-loop  solution  for  the  new  two-stage  problem. 

After  implementing  the  initial  control  decision  of  this 
optimal  open-loop  solution,  we  again  observe  the  state 


-8- 


and  use  the  optimal  control  decision  for  the  remaining 
one- stage  problem.  This  scheme  uses  the  optimal  open- loop 
initial  decision  at  each  stage,  but  incorporates  feedback 
in  the  observation  of  the  actual  state  attained.  We  call 
this  scheme  open-loop-optimal  feedback  control. 

This  control  scheme  differs  from  either  of  the  pre¬ 
vious  two.  The  initial  optimal  open-loop  decision  agrees 
with  the  feedback  decision  except  for  starting  at  node  A. 
There,  as  has  been  shown,  the  optimal  open- loop  control 
dictates  a  downward  decision.  Therefore,  the  expected 
cost  of  the  above  scheme  is 

•  80  +  •  84  *  83  . 

We  can  conclude  from  this  example  that 

1)  The  pure  open -loop  scheme  incorporating  no  use 

of  subsequent  information  about  actual  transitions 
yields  a  large  expected  sum; 

2)  The  pure  feedback  scheme  where  the  state  is  as¬ 
sumed  known  when  the  decision  is  made  yields  the 
smallest  possible  expected  sum  for  a  stochastic 
problem; 

3)  The  open- loop-optimal  feedback  scheme  yields  an 
intermediate  expected  sum.  Although  feedback  is 
used,  the  fact  that  feedback  is  to  be  used  is 
withheld  from  the  computation  determining  the 
control  decisions,  which  results  in  an  inferior 
control  scheme. 


-9- 


4.  A  CONTINUOUS  DETERMINISTIC  PROBLEM 

Let  us  now  consider  briefly  a  standard  continuous 
non- stochastic  control  problem.  Given  an  Initial  time 
tQ  and  final  time  T,  we  wish  to  use  control  u(t), 
tD  s  t  s  T,  so  as  to  guide  a  particle,  initially  in  state 
xo,  toward  the  origin  x  -  0.  We  attach  a  cost  to  using 
control  and  attempt  to  minimize  the  criterion  function 

T 

J  u2(t)dt  +  x2(T)  (4.1) 

Co 

where  the  first  term  represents  the  cost  of  control  and 
the  second  term  measures  the  deviation  from  the  origin  at 
the  terminal  time.  Motion  of  the  particle  is  given  by  the 
linear  differential  equation 

x(t)  -  ax(t)  +  bu(t)  .  (4.2) 

This  is  a  linear  control  problem  with  quadratic  criterion, 
and  has  been  much  analyzed.  We  consider  it  briefly  here 
in  order  to  acquaint  the  reader  with  the  type  of  problem 
we  shall  consider  subsequently  and  with  the  dynamic  pro¬ 
gramming  technique  of  solution. 

The  classical  necessary  conditions  for  an  extremum 
of  the  above  problem  are  given  in  terms  of  an  adjoint 
variable  or  Lagrange  multiplier  \  which  satisfies  the 
equation 
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X-  -aX  (4.3) 

and  terminal  condition 

X (T)  -  2x(T)  .  (4.4) 

The  optimal  control  is  given  by  the  condition 

2u  +  Xb  -  0  .  (4.5) 

Solution  of  (4.3)  with  boundary  condition  (4.4)  yields 
X(t)  -  2x(T)ea(T~t>  (4.6) 

and  therefore, 

u (t)  -  -x (T)bea^T't^  (4.7) 

so  u(t)  varies  exponentially  with  time.  The  unknown 
terminal  value  of  x,  x(T),  can  be  expressed  in  terms  of 
x(t)  by  substituting  the  control  rule  (4.7)  in  (4.2)  and 
solving.  The  resulting  expression  for  x(t)  in  terms  of 
x(T)  can  be  inverted,  and  the  control  at  time  t  is  then 
given  in  terms  of  the  state  at  time  t  by  equation  (4.7). 
Performing  these  steps  we  get 
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x(t)  -  x(tQ) ea ea(T-t) 


_  xjT)b2  ea(T-2tQ+t) 


*(tG)  " 


1  -TC+^e2a(T'to>)  *'a(T't°)  x(T) 


x(T) 


e*^  x(t) 
1 


!  .  b*  b*  2a(T-t) 
1  7a  +2a  e 


(4.8) 


(4.9) 


(4.10) 


u(t) 


be2a(T-t)  x(t) 

!  .  b2  b2  2a(T-t) 
1  la  +7a  e 


(4.11) 


This  is  the  feedback  solution  for  control  as  a  function 
of  state.  The  optimal  control  is  exponential  in  time,  or, 
for  a  given  time,  it  is  a  linear  function  of  the  state. 

The  dynamic  programming  solution  of  this  problem 
proceeds  as  follows:  Define  an  auxiliary  function  f(x,t) 
as  the  minimal  obtainable  value  of  the  criterion  function 
(4.1)  if  we  start  the  problem  in  state  x  at  time  t, 
tQ  s  t  s  T.  By  the  principle  of  optimality  linking  the 
initial  decision  with  the  remaining  optimal  decisions,  we 
have 


f(x,t) 


u(t) 


u2(t)dt 


+  f(x+(ax  +  bu)dt,  t  +  dt) 


.(4.12) 
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Expanding  (4.12)  in  Taylor  series,  dividing  by  dt  and 
letting  dt  approach  0,  we  get 

0  *  +  (a*  +  hu)  +  J  •  (4.13) 

Differentiating  with  respect  to  u  to  minimize  gives 

2u+b||  -  0  (4.14) 


and  substituting  u  determined  by  (4.14)  in  (4.13),  we 
obtain  the  non-linear  partial  differential  equation 


0 


+  ax 


(4.15) 


o 

Assuming  f(x,t)  has  the  separable  form  g(t)x  and  sub¬ 
stituting  in  (4.15),  we  find  that  g(t)  satisfies  the 
Riccatl  ordinary  differential  equation 


-  b2g2(t)  +  2ag  (t)  +  g'(t)  -  0 


(4.16) 


with 


g(T)  -  1  . 

Solution  of  this  equation  yields 


(4.17) 
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g(t)  -  + 


,2a (T-t) 


1  - 


7a 


bl  2a  (T-t) 
7a  e 


(4.18) 


whence 


f(x,t) 


2a  (T-t)  2 


17“ 

7a 


T2" 

7a 


,2a (T-t) 


(4.19) 


Substitution  in  (4.14)  yields  the  control  scheme 


u(t) 


be 


2a  (T-t) 


1  - 


bZ  ,  bZ  2a  (T-t) 
7a  +7a  e 


x(t) 


(4.20) 


which  agrees  with  (4.11).  Again,  as  in  Sec.  2,  we  see 
that  for  a  deterministic  problem  the  open-loop  and  feed¬ 
back  solutions  are  equivalent. 

5.  A  CONTINUOUS  STOCHASTIC  PROBLEM  [2-5] 

To  construct  a  stochastic  control  problem,  we  attach 
a  random  variable  to  the  equation  defining  the  evolution 
of  x.  We  write  the  discrete  rule 


x(t+At)  -  x(t)  +  [ax (t)  +  bu(t)  ]  At  +  5(t)  (5.1) 


where  S(t)  is  a  stochastic  process  with,  for  all  t. 
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(1)  E  (S(t))  -  0  (5.2) 

(2)  E  (?2(t))  -  a2 At  (5.3) 

(3)  E  (Sn(t))  -  o (At) ,  n  >  2  (5.4) 

(4)  S(t]),  ....  ?(tn)  are  independent  for  (5.5) 

any  finite  collection  of  distinct 

times  tp  ...,  tn 


2 

where  E  is  the  expected  value  operator,  o  is  a  constant, 
and  x  ■  o  (At)  means  the  limit  as  At  -  0  of  ^  is  zero. 
The  limiting  process  as  At  -  0  is  the  continuous  control 
problem  we  shall  consider.  Our  criterion  function  to  be 
minimized  is 


E 


T 

J*  u2  (t)dt  +  x2  (T) 
t„ 


(5.6) 


the  expected  cost  of  control  plus  terminal  deviation. 

The  optimal  open- loop  control  is  deduced  by  consider¬ 
ing  all  possible  functions  u(t),  tQ  *  t  *  T,  and  choosing 
the  one  that  minimizes  the  criterion  (5.6).  The  cost  of 
control  integral  is  deterministic.  Furthermore,  if  x(T) 

is  viewed,  at  the  initial  time  tQ,  as  a  random  variable 

2 

dependent  upon  u(t),  one  notes  that  the  variance  o 
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of  this  random  variable  is  independent  of  u(t).  Since  the 
expected  value  of  the  square  of  a  random  variable  is  its 
mean  squared  plus  its  variance,  we  have 

E (x2 (T))  -  [e(x(T))]  2  +  o2x(t)  (5.7) 

so  we  wish  to  choose  that  u(t)  which  minimizes 

T 

J*  u2dt  +  [e(x(T))1  2  .  (5.8) 

Due  to  the  linearity  of  the  equation  of  evolution  (5.1), 
the  expected  value  of  x(T)  is  the  value  of  x(T)  that  re¬ 
sults  from  integrating  (5.1)  with  forcing  function  u(t) 
and  with  the  stochastic  process  ?(t)  replaced  by  its  mean 
value  at  each  time,  zero.  Hence,  our  problem  reduces,  for 
the  special  assumptions  of  linear  equations  and  quadratic 
criterion,  to  precisely  the  deterministic  problem  that  we 
solved  in  the  previous  section. 

This  observation  leads  to  a  fourth  control  scheme, 
called  certainty  equivalent  control  [63.  This  scheme  re¬ 
places  the  random  variables  in  the  stochastic  problem  by 
their  expected  values  and  solves  the  resulting  deterministic 
control  problem.  Certainty  equivalent  control  is  seen  to 
be  equivalent  to  optimal  open- loop  control  in  the  above 
example . 

To  obtain  the  open- loop-optimal  feedback  control  for 
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the  above  problem,  we  express  the  control  as  a  function 
of  state,  as  was  done  in  equation  (4.11),  and  use  that 
control  having  observed  the  state  transition.  The  actual 
realization  of  the  control  function  then  depends  upon  the 
realization  of  the  stochastic  process;  one  expects  this 
scheme  to  perform  better  than  the  pure  open-loop  solution. 

The  pure  feedback  control  law  can  be  derived  by 
dynamic  programming.  One  defines  f(x,t)  as  the  minimal 
value  of  (5.6),  and  writes 

f(x,t)  *  m*n  ®  [u2  At  +  f(x  +  (ax  +  bu)  At 

+  ?,  t  +  At)]  .  (5.9) 


Hence,  expanding  in  series  and  taking  expectations  using 
(5.2)  through  (5.5), 


°  ■  min  p +  H  <ax  +  bu> +  7  °2  +  If]  *  (5  * 


10) 


Therefore, 


u 


Af 

■3? 


(5.11) 


and  we  must  solve  the  equation 


0  -  - 


Sf  - 
■Sx 


if  ,lft2  a2f  ,  af 

+  ax*3x+7°  ^  +  *3t 


(5.12) 
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Letting 

f  (x,t)  *  g(t)x2  +  h(t) 

g(T)  -  1  (5.13) 

h(T)  -  0 

we  find  that  g(t)  satisfies  the  same  equation,  (4.16),  as 
in  the  deterministic  case.  Since  the  optimal  control  only 
involves  g(t),  we  have  the  same  control  rule  as  in  Sec.  4, 
but  not  the  same  expected  cost,  due  to  the  h(t)  term  re¬ 
flecting  the  cost  of  the  randomness.  Hence,  the  optimal 
feedback  control  duplicates  the  open- loop -optimal  feedback 
scheme . 

These  equivalences  of  various  control  schemes  are 
unusual  and  are  the  result  of  our  many  assumptions  of 
linearity  and  quadraticity .  In  the  next  section  we  shall 
modify  the  problem  slightly  and  demonstrate  the  dis¬ 
similarity  of  the  four  different  control  philosophies  we 
have  distinguished. 

6.  ANOTHER  CONTINUOUS  STOCHASTIC  PROBLEM 

We  now  modify  the  above  problem  slightly.  We  assume 
that  the  variance  of  the  random  variable  ?  in  equation  (5.1) 
depends  upon  the  control  decision,  with  no  randomness  in 
the  evolution  of  x  if  no  control  is  exerted.  This  assump¬ 
tion  reflects  reality  in  many  applications.  We  replace 
(5.3)  by  the  equation 
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E(?2 (t))  -  u2o2At 


(6.1) 


2 

where  a  is  a  constant.  We  neglect  the  cost  of  control 
Integral  in  the  objective  function  (5.6),  since  the  cost 
of  control  is  now  reflected  in  the  uncertainty  attendant 
upon  the  use  of  control.  Our  criterion  function  is  now 
merely 


(6.2) 


For  simplicity,  we  take  a  -  0  in  the  equation  of  evolution 
(5.1),  and  use  the  continuous  limit  of 


x(t  +  At)  -  x (t)  +  [bu(t)  ]  At  +  §(t)  .  (6.J) 

We  first  consider  optimal  open- loop  control.  The 
variance  of  the  random  variable  x(T)  as  viewed  at  time  t 

o 

is 

T 

J*  u2(t)  a2  dt  (6.4) 

Co 

and  the  criterion  function  equals 
T 

[e(x(T))]  2  +  J  U2a2dt  .  (6.5) 

By  the  same  reasoning  as  above,  the  expected  value  of 
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x(T)  Is  the  value  yielded  by  replacing  the  random  variable 

§  at  each  time  t  by  Its  mean,  zero.  We  therefore  have 

the  same  problem  as  In  Sec.  4  and  Sec.  5,  except  for  a 
2 

factor  a  In  the  criterion  function  and  no  ax  term  In  the 
equation  of  motion.  The  adjoint  variable  \ (t)  is,  in  this 
case,  a  constant  with  terminal  value  2E(x(T)).  The  optimal 
control  is  given  by 

u (t)  -  -  E(x^T»b  (6.6) 

and  is  a  constant  function  of  time.  Expressed  in  terms 
of  state,  we  have 


u(t) 


JSJC£l 


b(T-t 


(6.7) 


which,  as  before,  is  linear  in  the  state  at  a  given  time. 
Using  open-loop  control,  the  expected  terminal  value  of 
x,  if  we  start  at  time  tQ  in  state  x(tQ),  is 


E[x(T)3 


°2 

- 


(6.8) 


and  the  variance  of  the  random  variable  x(T)  is  given  by 

„2 


o2  x2  (tD)  (T-t0) 


'x(T)  "  ~T~  XT 

b2  ^o+7> 


(6.9) 
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Hence,  the  value  of  the  criterion  function  is  given  by 


-  E (x (T) )  2  +  a2(x) 


°2  *2<t0> 

“9 - X 

b cr-te  +  jy) 


(6.10) 


We  next  analyze  the  open- loop-optimal  feedback  control 
scheme.  This  involves  using  the  rule  (6.7)  for  control  as 
a  function  of  state.  The  equation  of  motion  becomes 


x(t  +  At)  -  x(t) - ^  At  +  §(t)  .  (6.11) 

(T  -  t  +  *2_) 
bz 

2 

If  we  define  f(x,t)  as  the  expected  value  of  x  (T)  using 
the  above  rule,  we  have 


f(x,t) 


E 

? 


f(x  -  - x-  At.  2  +  5  ,  t  +  At) 

T  -  t  + 

b 1 


(6.12) 


which,  after  series  expansion,  letting  At  -*  0,  and  taking 
the  expectation,  gives 


0 


af 

■Sx 


x2  o2 

2b2  (T  -  t  +  ij)1 
bz 


(6.13) 
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Letting  f(x,t)  have  the  form 


f(x,t)  “  g(t)x‘ 
g(T)  =  1 


(6.14) 


we  obtain  the  linear  homogeneous  equation  for  g(t) 


g'(t)  + 


T  -  t  + 


b2  (T 


t  +^> 


g(t)  -  0  (6.15) 


so  that 


f  (x,t) 


2 

x  exp  < 


r  t 

i 


T  +  - IT 

O 


b2(T  -  T  +  ^) 
bZ 


-  2 


dT| 

(6 . 16) 


x2  exp 


1  - 


-j-  +  2  log  - 


b  (T  -  t  +^) 
bz 


2  log  (T  -  t  +  — j) 
b 


. 


(6.17) 


To  evaluate  the  expected  terminal  x  value,  given  that  we 
start  in  state  x(tQ)  at  time  tQ,  we  can  solve  equation 
(6.13)  with  solution  of  the  form 

f(x,t)  -  g (t) x  (6.18) 

g(T)  -  1 
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obtaining 

r  a2  x(t  ) 

E  Jx (T)J - 2 - ^  .  (6.19) 

b2(T  '  1 0+£7> 

0  bz 

This  result  is  the  sane  as  the  pure  open-loop  result  (6.8), 
which  is  explained  by  the  linearity  of  the  process. 

Analysis  of  the  feedback  scheme  begins  with  the  defini 
tion  of  f(x,t)  as  the  value  of  the  criterion  if  we  start 
in  state  x  at  time  t,  t  s  t  s  T  ,  and  use  an  optimal 
policy.  By  the  principle  of  optimality,  we  have 

f(x,t)  -  m*n  |  [f(x  +  (bu)  At  +  5  ,  t  +  At)]  (6.20) 


which  yields 


min 

u 


bu  9f  +  *2f  +  Sf 

bu<*+T  -£7+-*t 


(6.21) 


Hence,  setting  the  derivative  with  respect  to  u  equal  to 
zero  to  minimize, 


u 


,  Sf 

>  ~5x 

.2  3zf 


(6.22) 


Ax 


and,  substituting  (6.22)  in  (6.21), 
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f(x,t)  **  g(t)x2 
g(T)  -  1 


(6.23) 


(6.24) 


we  get 

y2 

0  “  -  — n  g(t)  +  g'(t)  . 

a 

Solving  for  g(t), 
k2 

-  K  (T-t) 

a  o 

f(x,t)  -  e  x 


(6.25) 


(6.26) 


u 


(6.27) 


If  we  now  define  h(x,t)  to  be  the  expected  terminal  x 
value  starting  in  state  x  at  time  t  and  using  control 
(6.27),  we  can  characterize  h(x,t)  by 

h (x , t)  -  |  jh(x  -  At  +  5,  t  +  At)J  (6.28) 


where  the  boundary  condition  is  now 


h(x,T) 


(6.29) 


Letting 


h(x,t)  -  g(t)x 


(6.30) 


we  find 


-  (T-t) 


h(x,t)  *  e 


(6.31) 


The  final  control  philosophy  we  have  mentioned  above 
is  certainty  equivalent  control,  the  optimal  control  for 
the  deterministic  system  that  results  from  replacing  all 
random  variables  in  the  stochastic  problem  by  their  ex¬ 
pected  values.  This  yields  the  problem:  Choose  u(t)  so 
that  x(T)  given  by 


x  (t)  *  bu(t) 
■  xo 


(6.32) 


minimizes  the  expression 


xZ(T)  . 


(6.33) 
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A  little  reflection  shows  that  x(T)  can  be  made  zero  by 
any  of  an  infinite  class  of  controls,  and  the  problem  is 
therefore  not  meaningful. 

We  are  now  in  a  position  to  recapitulate  our  results. 
Foremost  is  the  conclusion  that  the  four  different  con¬ 
trol  schemes  give  four  different  optimal  control  rules. 

For  open- loop  control  we  have  a  rule  given  as  a  function 
of  time  and,  naturally,  dependent  upon  tQ,  x(tQ),  and  T. 
This  rule,  which  never  depends  upon  the  realization  of  the 
stochastic  process  and  which,  in  our  particular  example, 
is  a  constant  function  of  time,  is  (by  equations  (6.6)  and 
(6.8)) 

x(t  ) 

u (t)  - 2 ^  .  (6.34) 

bCr-t0+^) 

The  open- loop-optima 1  feedback  control  law  is  expressed  rs 
a  function  of  current  state  and  time  and  depends  upon  the 
realization  of  the  stochastic  process.  It  does  not  depend 
explicitly  on  the  initial  state  or  time.  This  law  is 
(equation  (6.7)) 

U(C)  - 2  .  (6.35) 

b  (T-t  + 

bz 

Note  that  this  law  is  the  same  as  (6.34)  initially  (for 
state  x(tQ)  at  time  tQ)  and  that  it  duplicates  (6.34)  if 
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and  only  If  the  stochastic  process  takes  on  Its  mean  value, 
zero.  The  feedback  control  law  depends  on  the  current  time 
and  state,  just  as  does  the  above  scheme.  However,  due  to 
the  fact,  stressed  earlier,  that  the  optimization  mathe¬ 
matics  is  aware  of  the  feedback  nature  of  the  control,  we 
get  a  law  different  from  (6.35);  namely  (equation  6.27) 


u(o  -  -  4^ 

a 


(6.36) 


which,  in  this  particular  case,  does  not  happen  to  depend 
explicitly  on  the  current  time.  The  certainty  equivalence 
concept,  as  noted  earlier,  is  inappropriate  here  and  yields 
no  unique  control  law. 

If  we  examine  the  asymptotic  behavior  of  the  criterion 

function  for  a  long  process  (T  -  °°)  starting  at  time  zero 

o 

in  state  xQ,  we  see  that  the  expected  value  of  x  (T)  ap¬ 
proaches  zero  in  all  cases.  This  is  because  for  a  long 
process  very  little  control  is  exerted  at  any  particular 
time,  hence  there  is  little  randomness  and  we  can  steer 
assuredly  toward  the  origin.  The  nature  of  the  approach 
to  zero  as  a  function  of  the  length  of  the  process,  T,  is 
significant.  For  open- loop  control  the  approach  is  inverse- 
linear,  with  (equation  6.10) 


E 


(6.37) 


'  'sM&afewrtft  &**fe-*££»ws*s  *!f»fc5Ws»W^«^  *7V- ■i'ff-^P^iS !**■£•*.•  ■**•-„  f  >*&■£.  e'-rMAr-  »•  -»  -  -  •-:  -  ■  . 
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For  open- loop-optimal  feedback  control  we  have  inverse- 
square  convergence,  with  (equation  6.17) 


E 


x2  (T) 


4 

a  ex 


2 

o 


(6.38) 


Finally,  the  feedback  control  scheme  yields  negative- 
exponential  convergence  (equation  6.26) 

-b2I 

T  2  (6.39) 


x2(T) 


Both  the  open- loop  and  open- loop-optimal  feedback 
schemes  can  be  expected  to  reach  the  same  terminal  x  value 
(equations  6.8  and  6.19),  but  due  to  its  feedback  nature, 
the  latter  scheme  has  less  variance  associated  with  it. 

The  pure  feedback  control  has  an  expected  terminal  value 
much  closer  to  the  origin  (equation  6.31)  since  one  can 
aim  closer  with  the  assurance  that  deviations  resulting 
from  the  randomness  caused  by  the  greater  control  will  be 
corrected  later.  Examining  the  control  rules  themselves 
for  a  fixed  initial  point,  one  finds  that  the  pure  feedback 
scheme  calls  for  greater  control.  This  can  be  explained 
by  the  fact  that  the  feedback  scheme  can  afford  to  aim  closer 
to  the  origin  in  the  assurance  that  overshooting  due  to 
randomness  can  be  caught  and  corrected.  While  the  open- 
loop-optimal  feedback  scheme  will  also  catch  and  correct 


overshoot,  the  computation  of  the  control  rule  is  not 
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cognizant  of  this  fact  and  is,  therefore,  more  conservative. 
Pure  open- loop  control,  of  course,  will  not  compensate. 

7.  CONCLUSION 

We  see  than  that  for  any  but  the  simplest  stochastic 
problems,  the  various  control  philosophies  that  are  equiva¬ 
lent  for  deterministic  problems  are  quite  dissimilar. 
Further,  we  have  obtained  some  quantitative  idea  of  the 
relative  behavior  and  performance  of  several  different 
optimal  control  schemes. 
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